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Abstract 

The annotation of the results of database transforma- 
tions was shown to be very effective for various appUca- 
tions. Until recently, most works in this context focused 
on positive query languages. The provenance semirings 
is a particular approach that was proven effective for 
these languages, and it was shown that when propagating 
provenance with semirings, the expected equivalence ax- 
ioms of the corresponding query languages are satisfied. 
There have been several attempts to extend the frame- 
work to account for relational algebra queries with dif- 
ference. We show here that these suggestions fail to sat- 
isfy some expected equivalence axioms (that in particular 
hold for queries on "standard" set and bag databases). In- 
terestingly, we show that this is not a pitfall of these par- 
ticular attempts, but rather every such attempt is bound 
to fail in satisfying these axioms, for some semirings. 
Finally, we show particular semirings for which an ex- 
tension for supporting difference is (im)possible. 

1 Introduction 

The annotation of the results of database transformations 
with provenance information has quite a few applications 
IHElEllEQillllIlEaiZIlinillSEllEa Recent 
work llT6l rT2l [T4l has proposed a framework of semiring 
annotations. The idea is that every tuple of the database 
is associated with an element of a semiring K, and to 
propagate the annotations through query evaluation. This 
means that query constructs (of some expressiveness) are 
associated with operations in the semiring. For instance, 
the semiring addition corresponds to alternative deriva- 
tion of a tuple, and thus e.g. union of two relations corre- 
sponds to adding up the annotations of tuples appearing 
in both; multiplication corresponds to joint derivation, 
thus a tuple appearing in the result of relational join will 
have annotation which is the multiplication of annota- 
tions of the two tuples that were joined to obtain it. 

An important feature that guides the research on 
semiring-based provenance is that of algebraic unifor- 



mity, that is, the propagation of provenance through 
query evaluation is defined using only the semiring op- 
erations addition and multiplication (and the constants 
and 1). This uniformity is fundamental since it allows 
provenance management to work with abstract annota- 
tions for tuples (more concretely, the free semiring of 
polynomials), and specialize to any semiring with "con- 
crete" annotations (for tuples multiplicity, access con- 
trol levels, cost, etc. lfT6l [121 [141 ) when this informa- 
tion becomes available. Using this perspective, the above 
papers have developed the framework for positive rela- 
tional algebra (as well as for positive datalog, and pos- 
itive NRC). They have shown that for these languages, 
the semiring interpretation satisfies the expected axioms 
(e.g. the axioms that hold for set and bag semantics), for 
every choice of semiring. Moreover, the semiring axioms 
are /orceJ by those of the corresponding algebras [16|. 

To reach beyond positive queries one would like to 
deal with relational difference. Classical work on in- 
complete databases fl9| already provides a solution for 
set semantics but dealing with both set and bag se- 
mantics is mentioned in |[T6l [T2l as a puzzling open 
problem. A breakthrough was obtained by Geerts and 
Poggi 1 1 3 1 who also made connections with earlier work 
that we further exploit here. Other semantics have also 
emerged since [T5^,'4l, and we consider them below. Each 
such work has made a particular choice of semantics for 
provenance-aware relational difference. These semantics 
are all different, and in particular induce different axioms 
of query equivalence. Unfortunately, as we show below, 
for each of these definitions, some "natural" axioms that 
are expected to hold, fail in general. 

Consequently, we take here a different approach. 
Instead of suggesting a particular semantics for 
provenance-aware relational algebra with difference, we 
formulate a desired ( subjset of query equivalence axioms 
that are expected to hold from any such semantics, and 
ask: can one extend the semiring framework ( entailed by 
a subset of the axioms) to define a semantics for which 
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these axioms will hold, for every choice of semiring ? The 
main result of this paper is that the answer to this ques- 
tion is no, for very useful semirings. 

While this is a negative result, we believe that it is not 
the final word in the research on extending the prove- 
nance semiring framework to queries with difference. 
Our result indicates that, unlike the case of positive re- 
lational algebra, a one-size-fits-all semantics is unlikely 
to exist for queries with difference. One possible solution 
is to choose particular semirings that can be extended to 
account for difference, while satisfying the desired ax- 
ioms. We demonstrate this for a particular application of 
provenance, namely access control. 

The rest of this paper is organized as follows. In sec- 
tion 12] we recall the correspondence between the semir- 
ing axioms and those of the positive relational algebra. 
Then, in section [3] we study the introduction of the dif- 
ference operation. We review related work in section 2] 
and conclude in Section |5] 

2 The Positive Relational Algebra 

Our starting point is the technique of algebraic repre- 
sentation of annotation propagation that was introduced 
in |[T6i . This technique begins by assuming that the space 
K of annotations is equipped with two operations +, • 
and two constants 0, 1 used to define a semantics for pos- 
itive relational algebra (SPJU) on A' -relations, i.e., rela- 
tions whose tuples are annotated with elements from K. 

To define annotated relations we use the named per- 
spective of the relational model |JJ . Fix a countably infi- 
nite domain D of values (constants). For any finite set U 
of attributes a tuple is a function < : t/ — !■ D and we de- 
note the set of all tuples by . Given (A', +, •, 0, 1), a 
K-relation (with schema U) is a function R-.n" ^ K 
whose support, supp(i?) = {t \ R{t) ^ 0} is finite. For 
a fixed set U we denote by K-Rel (when U is clear from 
the context) the set of AT -relations with schema U. We 
use the notation t\u for the restriction of the tuple t to 
the attributes of U. We can then define the semantics of 
every relational algebra operator on K-Rel. Due to lack 
of space we repeat only two of the definitions, referring 
the reader to |[T6| for the others. 

Union If i?, : D*^ -> AT, i = 1, 2 then i?i U i?2 : ^ 
a: is defined by (i?i Ui?2)W = + ^2(<)- 

Natural Join If i?, : B^- ^ AT, i = 1, 2 then Ri cxi 
i?2 : ^ a: is defined by i?2)(ii) 

Ri{t) ■ i?2(i2) where ti = t\ui and t2 = t\u2- 

As stated in lfT6l . requiring that this semantics satisfy 
the relational algebra identities in figure [TIa) is equiva- 
lent to {K, +,-,0,1) satisfying the equational axioma- 
tization in figure [ifb), i.e., forming a specific algebraic 
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Figure 1: AT-relational algebra identities and alge- 
braic axiomatization for the space AT of annotations 

structure called a commutative semiring. The correspon- 
dence is very tight: for n = 1, . . . , 8, AT-relations satisfy 
identity In iff (AT, +, •, 0, 1) satisfies axiom An. 

Why the relational algebra identities in figure [Ila)? 
We rely on two important cases, namely set and bag 
semantics, corresponding to the commutative semirings 
(B, V, A, ±, T) and (N, +, •, 0, 1) resp. The identities in 
figure[Tla) hold in both cases. A second argument is that 
many more relational algebra identities (omitted here) 
for projection and selection already follow from A1-A8. 

3 Adding Relational Difference 

Consider now the full relational algebra, i.e., the positive 
algebra we already dealt with together with the relational 
difference operator A natural approach to propagating 
annotations through difference [[13], is to add an alge- 
braic operation — to the semiring structure and to define 

Difference U Ri, R2 : ^ K then i?i - i?2 : B'^ ^ 
a: is defined by (i?i - i?2)(i) = Ri{t) -i?2(0- 

Following the same approach, we now search for an 
equational axiomatization for (AT, +, •, 0, 1, — ). 
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(All) a+{b-a) ^b+{a-b) 
(A12) a - (b + c)^ {a-b)-c 
(A13) a-{b — c)— a-b — a-c 
(b) 

Figure 2: Extending the identities and axiomatization 
for relational algebra including difference 

Similarly to our treatment of the positive relational al- 
gebra, we consider additional axioms 19-113 that hold for 
both set and bag semantics, involving relational differ- 
ence. As before, these identities correspond to the equa- 
tional axioms A9-A13 for {K, +, ■, 0, 1, — ), depicted in 
Figure |2b). Again as before, we can state additional 
identities involving difference and projection or differ- 
ence and selection, e.g., (Tp^{R ~ S) = ap^R — (Jp^S; 
and again these follow from A9-A13. 

Next we show that satisfying axioms A1-A12 is rel- 
atively "easy" but further satisfying A13 complicates 
matters considerably. Indeed, by a result of Bos- 
bach ||6l, axioms A1-A12 characterize the notion of 
monus- semiring |T3 1 {m-semiring for short) Q. 

For a commutative semiring {K, +, •, 0, 1), the struc- 
ture {K, +, 0) is a commutative monoid. To define m- 
semirings we need the following. 

Definition 3.1 Let (K, + ,0) be a commutative monoid. 
Define 

a < b -iF^ 3c a + c = b 

When < is an order relation it is called the natural order 
on K and the monoid K is said to be naturally ordered. 

Examples for naturally ordered commutative monoids 
are the natural numbers (N, +,0) and the booleans 
(B, V, _L), but not (Z, +, 0). The next proposition shows 
that in such monoids, axioms A9-A12 uniquely deter- 
mine the — operation. 



'This was obviously known to tlie authors of fl31; in fact, their 
citation of (3] led us to Bosbach's work. 



Proposition 3.2 (lISll via ll3l) Let {K,+,Q) be a natu- 
rally ordered commutative monoid. For any binary oper- 
ation a—b on K the following are equivalent 

(i) For all a, b, a—b is the smallest c such that a < b+c. 

(ii) For all a, b, c we have a—b < ciffa < b + c. 

(iii) Axioms A9-A1 2 hold. 

lfT3l defines m-semirings as commutative semirings 
whose additive monoid is naturally ordered and satis- 
fies condition (i) in Proposition 13.21 Therefore, in an 
m-semiring the — operation is completely determined by 
the + operation. Bosbach's characterization implies: 

Corollary 3.3 (X, +, •, 0, 1, — ) is an m-semiring iffAl- 
A12 hold. 

In the sequel, for ease of reading we will not dis- 
tinguish between semirings and their extensions to m- 
semirings (when such extension is possible); when this 
extension is possible we simply say that a particular 
semiring is an m-semiring. 

Of course, (B, V, A, _L, T), (N,+,-,0,l), and 
(M+,+,-,0,1) are all m-semirings (but (Z, +,-,0, 1) 
or (M, +, •, 0, 1) are not). The semiring of provenance 
polynomials [16] (N[X], +, •, 0, 1) is also an m-semiring 
(albeit it lacks the universal property that it enjoys 
among commutative semirings [13], see discussion in 
the last section). Any boolean algebra is an m-semiring, 
with a — b = af\^b. Moreover, any complete distributive 
lattice is an m-semiring because 

b + inj {c\a < 6 + c} = inf{b + c\a < b + c} > a 

In particular, the fuzzy semiring fuzz = 
{[0,1] , max, min, 0,1) is an m-semiring. Finally, 
any finite distributive lattice is complete, hence com- 
pletely distributive, hence an m-semiring. In particular, 
the following are also of interest; (1) the m-semiring of 
all positive boolean expressions over a set of variables 
X, PosBool[X], (2) the three value logic TVL, and 
(3) the security semiring § = (S, min, max, Q., 1^,) 
where S is the ordered set 1, < C < S < T < 
whose elements have the following meaning when used 
as annotations: l^,: public ("always available"), C : 
confidential, S : secret, T : top secret, and Q, means 
"never available" 1 12|. 

Additional m-semirings of interest are the tropical 
semiring T = (N°°, mm, +, oo, 0), why-provenance 
semiring 1^8 1 and Trio semiring [51 (In |T4 | Green shows 
that why and Trio provenance can be captured via semir- 
ings), and the boolean expressions semiring Bool [X] . 

While almost all the semirings considered in conjunc- 
tion with positive queries are m-semirings, satisfying the 
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axiom A13 is another story. Table |3] summarizes our re- 
sults on satisfaction of A13 for the above m-semirings. 
We next prove this characterization for some of these 
semirings (the proofs for the rest use similar techniques 
and are omitted for lack of space). 
Proposition 3.4 If{K, +, ■, 0, 1, — ) is a distributive lat- 
tice that is an m-semiring and has two elements a, h s.t. 
a > b and (a — b) ■ b =/= then A13 fails in K. 

Proof. Recall that in a distributive lattice + corresponds 
to max and • to min. Indeed, we obtain [a — b) ■ b ^ Q 
hut a-b-b-b = b-b = 

Corollary 3.5 A13 fails in the security m-semiring 
§, the m-semiring of positive boolean expressions 
PosBool[X], and the fuzzy m-semiring fuzz. 

Proof, [sketch] In the security m-semiring, a — 5 = a if 
a is less secure than b, and otherwise. For a ~ S and 
b = Twe obtain a - 6 = and (a - 5) • = T ^ 0. 

For PosBool[X], let x,y,z & X he three distinct vari- 
ables and a = xV yW z,b = xV y. We obtain a — b~z, 
and (a - 6) • 6 = z A (a; V y) ^ 0. 

For fuzz, a — b~a if a>b (and otherwise). Any 
two values a > b ^ satisfy the requirement. 

The security semiring is of particular interest, but A13 
does not hold there. One practical solution for this partic- 
ular case is to work with an alternative, "good" semiring 
§' = (P(§ - {Oj),U,n,0,S - {QJ). The elements of 
§' are all subsets of security credentials, and the idea is 
that every tuple is annotated, explicitly, with the creden- 
tials of all users that are allowed to see it (an empty set 
has the interpretation of "never available"). In particular, 
it is easy to embed the annotations of § in §' - every el- 
ement s in § is mapped to a set of all elements that are 
greater or equal to s according to the order relation on 
S. We may now use set difference as difference opera- 
tor, and can show that the obtained m-semiring satisfies 
A1-A13. Note that a downside here, is that the size of 
annotations in §' is greater than of those in §. 

4 Related Work 

Provenance information has been extensively studied in 
the database literature. Different provenance manage- 
ment techniques are introduced in ifTTI lTllSHSl. etc., and 
we discussed them above in the context of their semiring 



representation llT4l . Several semantics of provenance- 
aware difference have been proposed. Our result shows 
that no semantics can satisfy all axioms A1-A13 above. 
We have already shown that the monus semantics ||T3| 
fails (in general) at A13; we next identify where other 
suggested semantics fail. 

Z semantics |fT5^. In fTSi the authors suggest a seman- 
tics for difference on Z-relations, i.e. relations annotated 
by integers. In a nutshell, the semantics defines the an- 
notation of a tuple t in the result of relational difference 
i? — 5* to be its annotation in S subtracted from its an- 
notation in R; the resulting annotation may be negative. 
This definition fails to satisfy axioms AlO and All. 

Semantics based on aggregate queries |4l. In f4l we 

have suggested a semantics for queries with aggregation, 
since nested aggregation queries can encode queries with 
difference. Consequently we have obtained a semantics 
for difference. Intuitively, our definition entails that a 
tuple t appears in i? — 5 if it appears in R, but does 
not appear in S. When the tuple appears in the result 
of R ~ S, it carries its original annotation from R. I.e. 
the existence of t in 5 is used as a boolean condition. 
We can observe that this semantics fails to satisfy axiom 
All, and the other axioms hold. 

Of course, no semantics is necessarily "better" than 
other, and the choice of semantics thus depends on the 
application and on the desired axioms. 

5 Conclusions and further work 

The provenance polynomials semiring N[X] was shown 
in [16 1 to serve as "universal" provenance annotation do- 
main for the positive relational algebra queries. llT3l in- 
troduced m-semirings to deal with relational difference 
and noted that while N[X] is also an m-semiring, it lacks 
the universality property. Since m-semirings form an 
equational variety (axiomatized by A1-A12), [13] pro- 
poses to take the free m-semiring, for which there is a 
standard algebraic construction, as the "new" N[X]. 

We have identified an important and useful algebraic 
identity, A13, that fails for several important semirings 
who are, nonetheless, m-semirings. This leads us, for 
example, to search for alternative semirings for security. 

It also follows that the free m-semiring fails to satisfy 
A 13. By the way, N[X] does satisfy A13 but it still lacks 
the desired universal property. It seems that one should 
therefore take the free algebraic structure satisfying Al- 
A13 as the universal provenance annotation domain for 
the full relational algebra. However, the standard con- 
struction of this structure is awfully uninformative. A 
task for the future would be the study of this structure 
with the aim of providing more manageable and illumi- 
nating characterizations. 
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